Application of metabolomics to plant genotype discrimination using statistics and machine learning

نویسندگان

  • Janet Taylor
  • Ross D. King
  • Thomas Altmann
  • Oliver Fiehn
چکیده

MOTIVATION Metabolomics is a post genomic technology which seeks to provide a comprehensive profile of all the metabolites present in a biological sample. This complements the mRNA profiles provided by microarrays, and the protein profiles provided by proteomics. To test the power of metabolome analysis we selected the problem of discrimating between related genotypes of Arabidopsis. Specifically, the problem tackled was to discrimate between two background genotypes (Col0 and C24) and, more significantly, the offspring produced by the crossbreeding of these two lines, the progeny (whose genotypes would differ only in their maternally inherited mitichondia and chloroplasts). OVERVIEW A gas chromotography--mass spectrometry (GCMS) profiling protocol was used to identify 433 metabolites in the samples. The metabolomic profiles were compared using descriptive statistics which indicated that key primary metabolites vary more than other metabolites. We then applied neural networks to discriminate between the genotypes. This showed clearly that the two background lines can be discrimated between each other and their progeny, and indicated that the two progeny lines can also be discriminated. We applied Euclidean hierarchical and Principal Component Analysis (PCA) to help understand the basis of genotype discrimination. PCA indicated that malic acid and citrate are the two most important metabolites for discriminating between the background lines, and glucose and fructose are two most important metabolites for discriminating between the crosses. These results are consistant with genotype differences in mitochondia and chloroplasts.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

This paper presents a data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a data-driven analysis is carried out using a public dataset consisting of 1H-NMR metabolite profile. This dataset suffers from the problem of imbalanced classes...

متن کامل

Application of ensemble learning techniques to model the atmospheric concentration of SO2

In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...

متن کامل

Forecasting the Tehran Stock market by Machine ‎Learning Methods using a New Loss Function

Stock market forecasting has attracted so many researchers and investors that ‎many studies have been done in this field. These studies have led to the ‎development of many predictive methods, the most widely used of which are ‎machine learning-based methods. In machine learning-based methods, loss ‎function has a key role in determining the model weights. In this study a new loss ‎function is ...

متن کامل

Machine Learning Models for Housing Prices Forecasting using Registration Data

This article has been compiled to identify the best model of housing price forecasting using machine learning methods with maximum accuracy and minimum error. Five important machine learning algorithms are used to predict housing prices, including Nearest Neighbor Regression Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random Forest Regression Algorithm (RFR), Extreme Gradient B...

متن کامل

Intelligent application for Heart disease detection using Hybrid Optimization algorithm

Prediction of heart disease is very important because it is one of the causes of death around the world. Moreover, heart disease prediction in the early stage plays a main role in the treatment and recovery disease and reduces costs of diagnosis disease and side effects it. Machine learning algorithms are able to identify an effective pattern for diagnosis and treatment of the disease and ident...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 18 Suppl 2  شماره 

صفحات  -

تاریخ انتشار 2002